Enhancing Document Clustering Using Term Re-weighting Based on Semantic Features

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Clustering Using Term Weights and Class Label Terms Based on Semantic Features

Clustering of class labels can be generated automatically, which is much lower quality than labels specified by human. In this paper, we propose a new enhancing document clustering method using terms of class label and term weights. The terms of class label can well represent the inherent structure of document clusters by non-negative matrix factorization (NMF). It can also improve the quality ...

متن کامل

Term weighting based on document revision history

In real-world information retrieval systems, the underlying document collection is rarely stable or definite. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a...

متن کامل

Enhancing Document Clustering Using Hybrid Models for Semantic Similarity

Different document representation models have been proposed to measure semantic similarity between documents using corpus statistics. Some of these models explicitly estimate semantic similarity based on measures of correlations between terms, while others apply dimension reduction techniques to obtain latent representation of concepts. This paper proposes new hybrid models that combine explici...

متن کامل

Document Clustering using Weighting and Labels based on Inherent Structure of Document

In classic document clustering, documents appear terms frequency without considering the semantic information of each document (i.e., vector model). The property of vector model may be incorrectly classified documents into different clusters when documents of same cluster lack the shared terms. Recently, to overcome this problem uses knowledge based approaches. However, these approaches have an...

متن کامل

Utilizing Term Proximity based Features to Improve Text Document Clustering

Measuring inter-document similarity is one of the most essential steps in text document clustering. Traditional methods rely on representing text documents using the simple Bag-of-Words (BOW) model which assumes that terms of a text document are independent of each other. Such single term analysis of the text completely ignores the underlying (semantic) structure of a document. In the literatur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Journal of the Korean Institute of Information and Communication Engineering

سال: 2013

ISSN: 2234-4772

DOI: 10.6109/jkiice.2013.17.2.347